Quick Introduction to Pandas

About Me:
Aditya Laghate aka @ThinRhino
A pseudo geek - Currently learning Math, Statistics & Data Analytics



In [1]:

    
# imports
from pandas import Series, DataFrame
import pandas as pd

pandas provides rich data structures and functions designed to make working with structured data fast, easy, and expressive. The primary object in pandas that will be used in this book is the DataFrame, a two- dimensional tabular, column-oriented data structure with both row and column labels. pandas combines the high performance array-computing features of NumPy with the flexible data manipulation capabilities of spreadsheets and relational databases (such as SQL). It provides sophisticated indexing functionality to make it easy to reshape, slice and dice, perform aggregations, and select subsets of data.

Pandas: Basic datastructes: Series

A Series is a one-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index.



In [2]:

    
# Series
obj = Series([-4, 3, 6, 6])
obj









    Out[2]:





0   -4
1    3
2    6
3    6
dtype: int64



In [3]:

    
obj.values









    Out[3]:





array([-4,  3,  6,  6])



In [4]:

    
obj.index









    Out[4]:





Int64Index([0, 1, 2, 3], dtype=int64)



In [5]:

    
obj2 = Series([2, 5, -5, 3], index=['a', 'b', 'c', 'd'])
obj2









    Out[5]:





a    2
b    5
c   -5
d    3
dtype: int64



In [6]:

    
obj2.values, obj2.index









    Out[6]:





(array([ 2,  5, -5,  3]), Index([u'a', u'b', u'c', u'd'], dtype=object))



In [8]:

    
# NumPy array operations
obj2['a']









    Out[8]:





2



In [9]:

    
obj2 * 2









    Out[9]:





a     4
b    10
c   -10
d     6
dtype: int64



In [10]:

    
# Converting Python dict to a series

sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000,}
obj3 = Series(sdata)
obj3









    Out[10]:





Ohio      35000
Oregon    16000
Texas     71000
Utah       5000
dtype: int64



In [11]:

    
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = Series(sdata, index=states)
obj4









    Out[11]:





California      NaN
Ohio          35000
Oregon        16000
Texas         71000
dtype: float64



In [13]:

    
obj4.dropna()









    Out[13]:





Ohio      35000
Oregon    16000
Texas     71000
dtype: float64



In [14]:

    
obj3 + obj4









    Out[14]:





California       NaN
Ohio           70000
Oregon         32000
Texas         142000
Utah             NaN
dtype: float64

DataFrame



In [15]:

    
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
frame



In [16]:

    
frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],index=['one', 'two', 'three', 'four', 'five'])
frame2



In [17]:

    
frame2['debt'] = 16.5
frame2



In [18]:

    
frame2['debt'] = np.arange(5.)
frame2



In [19]:

    
val = Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
frame2['debt'] = val
frame2



In [20]:

    
frame2['eastern'] = frame2.state == 'Ohio'
frame2









    Out[20]:






  
    
      
      year
      state
      pop
      debt
      eastern
    
  
  
    
      one
       2000
         Ohio
       1.5
       NaN
        True
    
    
      two
       2001
         Ohio
       1.7
      -1.2
        True
    
    
      three
       2002
         Ohio
       3.6
       NaN
        True
    
    
      four
       2001
       Nevada
       2.4
      -1.5
       False
    
    
      five
       2002
       Nevada
       2.9
      -1.7
       False



In [21]:

    
population = {'Nevada': {2001: 2.4, 2002: 2.9},
              'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame3 = DataFrame(population)
frame3



In [22]:

    
frame3.T

Examples + Functionality



In [23]:

    
# input data
# Cannot share data!
df = pd.read_excel('/Users/aditya/PLUG/pandas/chap_04.xls', 'Sheet1')
df









    Out[23]:




<class 'pandas.core.frame.DataFrame'>
Int64Index: 16416 entries, 0 to 16415
Data columns (total 9 columns):
id           16416  non-null values
to_port      16416  non-null values
from_port    16416  non-null values
hs_code      16416  non-null values
descp        16416  non-null values
qty          16416  non-null values
unit         16416  non-null values
value        16416  non-null values
date         16416  non-null values
dtypes: float64(4), object(5)



In [24]:

    
df.head()









    Out[24]:






  
    
      
      id
      to_port
      from_port
      hs_code
      descp
      qty
      unit
      value
      date
    
  
  
    
      0
       13992
           DUBAI
       INAMD4
       4059090
                                               MILK MAWA
       500
       KGS
        88698.75
       2013-02-01
    
    
      1
       30310
           DUBAI
       INAMD4
       4059090
                                               MILK MAWA
       700
       KGS
       119600.25
       2012-11-02
    
    
      2
       37135
         BANGKOK
       INAMD4
       4039090
                                                CREAM-35
         2
       NOS
        42504.00
       2012-10-03
    
    
      3
       47681
          MAPUTO
       INAMD4
       4089900
                                                EMU EGGS
        18
       PCS
        21231.30
       2013-01-05
    
    
      4
       53994
       CAPE TOWN
       INAMD4
       4051000
       PEANUT BUTTER SMOOTH HAVING NO. COMM. VALUE VA...
        13
       PAC
         8898.89
       2012-10-05



In [25]:

    
# Change index and convert string to datetime
df.index = pd.to_datetime(df.pop('date'), format='%Y-%m-%d')



In [26]:

    
df









    Out[26]:




<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 16416 entries, 2013-02-01 00:00:00 to 2012-01-08 00:00:00
Data columns (total 8 columns):
id           16416  non-null values
to_port      16416  non-null values
from_port    16416  non-null values
hs_code      16416  non-null values
descp        16416  non-null values
qty          16416  non-null values
unit         16416  non-null values
value        16416  non-null values
dtypes: float64(4), object(4)



In [27]:

    
df.head()









    Out[27]:






  
    
      
      id
      to_port
      from_port
      hs_code
      descp
      qty
      unit
      value
    
    
      date
      
      
      
      
      
      
      
      
    
  
  
    
      2013-02-01
       13992
           DUBAI
       INAMD4
       4059090
                                               MILK MAWA
       500
       KGS
        88698.75
    
    
      2012-11-02
       30310
           DUBAI
       INAMD4
       4059090
                                               MILK MAWA
       700
       KGS
       119600.25
    
    
      2012-10-03
       37135
         BANGKOK
       INAMD4
       4039090
                                                CREAM-35
         2
       NOS
        42504.00
    
    
      2013-01-05
       47681
          MAPUTO
       INAMD4
       4089900
                                                EMU EGGS
        18
       PCS
        21231.30
    
    
      2012-10-05
       53994
       CAPE TOWN
       INAMD4
       4051000
       PEANUT BUTTER SMOOTH HAVING NO. COMM. VALUE VA...
        13
       PAC
         8898.89



In [28]:

    
# Monthly mean
mth_mean = df.value.resample('M', how=['mean', 'median', 'std'])
mth_mean.head()









    Out[28]:






  
    
      
      mean
      median
      std
    
    
      date
      
      
      
    
  
  
    
      2012-01-31
        959229.921205
       386905.455
       1285043.331985
    
    
      2012-02-29
       1058314.492339
       368042.130
       1478173.074670
    
    
      2012-03-31
       1176705.192077
       436196.825
       1573295.375182
    
    
      2012-04-30
       1493411.445170
       754509.750
       1883248.945086
    
    
      2012-05-31
       1064001.441176
       307954.995
       1581324.286834



In [29]:

    
mth_mean.plot()









    Out[29]:





<matplotlib.axes.AxesSubplot at 0x108d9e690>



In [30]:

    
hs_code = df.groupby([df.hs_code, df.index])['value'].mean()
hs_code









    Out[30]:





hs_code  date      
4011000  2012-02-03      2780.780000
         2012-03-08    246420.000000
         2012-03-15    352836.000000
         2012-06-11    187621.000000
         2012-07-11     38397.333333
         2012-07-19    117491.500000
         2012-07-26    131503.133333
         2012-08-04    159472.060000
         2012-08-16    375526.000000
         2012-09-01      4825.960000
         2012-09-07    105120.936667
         2012-09-12    137767.000000
         2012-10-12     67138.000000
         2012-10-16    143940.800000
         2012-11-02    280596.600000
...
4090000  2013-09-21    2063096.503333
         2013-09-22      25512.810000
         2013-09-23    3739585.000000
         2013-09-24    1851325.433333
4100020  2013-01-27    3195802.400000
4100090  2012-07-27      79242.500000
         2012-10-16      28404.975000
         2012-12-29      26002.660000
         2013-02-14    1284040.730000
         2013-02-20     201984.230000
         2013-03-21       9887.370000
         2013-06-18        770.740000
         2013-07-23       1788.830000
         2013-08-26     429075.250000
         2013-09-19       7419.510000
Name: value, Length: 4861, dtype: float64



In [31]:

    
df.from_port.unique()









    Out[31]:





array([u'INAMD4', u'INASR6', u'INBFR6', u'INBLJ6', u'INBLR4', u'INBOM4',
       u'INBRC6', u'INCCJ4', u'INCCU1', u'INCCU4', u'INCOK1', u'INCOK4',
       u'INCPL6', u'INDEL4', u'INDER6', u'INDIG6', u'INGHR6', u'INHYD4',
       u'INIGU6', u'ININD6', u'INIXY1', u'INJAI4', u'INJNR6', u'INJUC6',
       u'INKKU6', u'INKNU6', u'INLDH6', u'INLON6', u'INMAA1', u'INMAA4',
       u'INMUL6', u'INMUN1', u'INNGP6', u'INNML1', u'INNSA1', u'INPAV1',
       u'INPNK6', u'INPPG6', u'INPTL6', u'INPTPB', u'INRXLB', u'INSAU6',
       u'INSBI6', u'INSLL6', u'INSNF6', u'INSTT6', u'INTDE6', u'INTKD6',
       u'INTLG6', u'INTRV4', u'INTUT1', u'INTUT6', u'INTVT6', u'INVTZ1',
       u'INWFD6', u'INCJB4'], dtype=object)



In [33]:

    
grp_from_port = df['value'].groupby(df.from_port).mean()
grp_from_port









    Out[33]:





from_port
INAMD4          50343.243529
INASR6          80679.493333
INBFR6       17108924.530500
INBLJ6       21627388.000000
INBLR4         936275.835472
INBOM4         456250.207978
INBRC6           3216.150000
INCCJ4        1410133.738500
INCCU1        1752904.833333
INCCU4          79375.828169
INCJB4            216.140000
INCOK1        4153438.319750
INCOK4        1318740.454655
INCPL6        3257838.265639
INDEL4         127494.707753
INDER6        4367968.702348
INDIG6        3699249.951371
INGHR6        3106688.736522
INHYD4         197461.645135
INIGU6         556976.160000
ININD6         175183.656667
INIXY1          82307.460000
INJAI4           6162.625000
INJNR6         272113.560000
INJUC6          80625.532222
INKKU6         103572.702381
INKNU6        1585317.140000
INLDH6        2001159.927033
INLON6        4145742.874536
INMAA1        1937629.396295
INMAA4         168402.481327
INMUL6        5142684.375000
INMUN1         241851.347842
INNGP6         158273.442644
INNML1          53109.305000
INNSA1        2634457.914446
INPAV1           1659.690000
INPNK6        1800815.610000
INPPG6         126199.658000
INPTL6        3601896.675000
INPTPB         646107.514316
INRXLB         324754.218339
INSAU6        8978054.000000
INSBI6        3996768.823058
INSLL6         932813.600294
INSNF6        2284636.987692
INSTT6        1981081.200000
INTDE6         334989.260000
INTKD6        3853034.473504
INTLG6         851770.669545
INTRV4          38729.078286
INTUT1        1660852.233313
INTUT6         268119.810000
INTVT6        2538729.798889
INVTZ1         403831.782000
INWFD6        2736054.635055
Length: 56, dtype: float64



In [34]:

    
grp_from_port[:10].plot(kind='barh',color='k', alpha=0.7)









    Out[34]:





<matplotlib.axes.AxesSubplot at 0x109123b10>



In [35]:

    
from_port = df.groupby(df.from_port)['value'].mean()
from_port[:10].plot(kind='bar')









    Out[35]:





<matplotlib.axes.AxesSubplot at 0x108d90e10>



In [36]:

    
new_df = DataFrame({'key1': ['a', 'a', 'b', 'b', 'a'],
                    'key2': ['one', 'two', 'one', 'two', 'one'],
                    'data1': np.random.randn(5),
                    'data2': np.random.randn(5)})
new_df



In [37]:

    
grouped = new_df['data1'].groupby(new_df['key1'])
grouped









    Out[37]:





<pandas.core.groupby.SeriesGroupBy object at 0x109136cd0>



In [38]:

    
grouped.mean()









    Out[38]:





key1
a      -0.548931
b      -0.049568
dtype: float64

Other Datasets



In [39]:

    
# https://github.com/pydata/pydata-book/blob/master/ch08/tips.csv

tips = pd.read_csv('/Users/aditya/Repo/pydata-book/ch08/tips.csv')
# tips.describe()
tips









    Out[39]:






  
    
      
      total_bill
      tip
      sex
      smoker
      day
      time
      size
    
  
  
    
      0
       16.99
       1.01
       Female
       No
       Sun
       Dinner
       2
    
    
      1
       10.34
       1.66
         Male
       No
       Sun
       Dinner
       3
    
    
      2
       21.01
       3.50
         Male
       No
       Sun
       Dinner
       3
    
    
      3
       23.68
       3.31
         Male
       No
       Sun
       Dinner
       2
    
    
      4
       24.59
       3.61
       Female
       No
       Sun
       Dinner
       4



In [45]:

    
tips.tail(10)



In [46]:

    
# Add tips percentage of total bill
tips['tip_pct'] = tips.tip / tips.total_bill
tips.head()



In [47]:

    
grouped = tips.groupby([tips.sex, tips.smoker])
grouped_pct = grouped['tip_pct']
grouped_pct.agg('mean')









    Out[47]:





sex     smoker
Female  No        0.156921
        Yes       0.182150
Male    No        0.160669
        Yes       0.152771
Name: tip_pct, dtype: float64



In [48]:

    
functions = ['count', 'mean', 'max']
result = grouped['tip_pct', 'total_bill'].agg(functions)
result



In [49]:

    
def top(df, n=5, column='tip_pct'):
    return df.sort_index(by=column)[-n:]

top(tips, n=10)



In [50]:

    
tips.groupby('smoker').apply(top)



In [51]:

    
tips.groupby(['smoker', 'day']).apply(top, n=1, column='total_bill')

More Plots



In [52]:

    
df = DataFrame(np.random.rand(6, 4),
               index=['one', 'two', 'three', 'four', 'five', 'six'],
               columns=pd.Index(['A', 'B', 'C', 'D'], name='Genus'))
df



In [53]:

    
df.plot(kind='bar')









    Out[53]:





<matplotlib.axes.AxesSubplot at 0x108ba5cd0>



In [54]:

    
df.plot(kind='barh', stacked=True, alpha=0.5)









    Out[54]:





<matplotlib.axes.AxesSubplot at 0x108bd4810>



In [55]:

    
party_counts = pd.crosstab(tips.day, tips.size)
party_counts



In [56]:

    
# Not many parties with size 1(!) and 6
# Eliminating
party_counts = party_counts.ix[:, 2:5]
party_counts



In [57]:

    
# Normalize to sum to 1
party_pcts = party_counts.div(party_counts.sum(1).astype(float), axis=0)
party_pcts



In [58]:

    
party_pcts.plot(kind='barh', stacked=True, alpha=0.4)









    Out[58]:





<matplotlib.axes.AxesSubplot at 0x108c77390>

Missing values



In [59]:

    
s = Series(np.random.randn(6))
s









    Out[59]:





0   -0.024049
1    1.378600
2   -0.391518
3    0.958736
4    0.153957
5   -1.135608
dtype: float64



In [60]:

    
s[::2] = np.nan
s









    Out[60]:





0         NaN
1    1.378600
2         NaN
3    0.958736
4         NaN
5   -1.135608
dtype: float64



In [61]:

    
s.fillna(s.mean())









    Out[61]:





0    0.400576
1    1.378600
2    0.400576
3    0.958736
4    0.400576
5   -1.135608
dtype: float64

Pivot Tables

A pivot table is a data summarization tool frequently found in spreadsheet programs and other data analysis software. It aggregates a table of data by one or more keys, arranging the data in a rectangle with some of the group keys along the rows and some along the columns.



In [62]:

    
tips.pivot_table(rows=['sex', 'smoker'])



In [63]:

    
# To use the aggregation function, pass it to aggfunc. For example, 'count' or len will give you a 
# cross-tabulation (count or frequency) of group sizes

tips.pivot_table('tip_pct', rows=['sex', 'smoker'], cols='day', aggfunc=len, margins=True)

	data1	data2	key1	key2
0	-0.177947	1.585618	a	one
1	-1.883930	-0.746285	a	two
2	-1.262905	-0.072472	b	one
3	1.163769	-0.125257	b	two
4	0.415085	0.787418	a	one

Genus	A	B	C	D
one	0.667401	0.899211	0.751784	0.726842
two	0.842748	0.609299	0.594116	0.709191
three	0.423631	0.283005	0.585332	0.454224
four	0.847370	0.204184	0.169221	0.885130
five	0.754614	0.744178	0.518544	0.995939
six	0.836338	0.652472	0.013708	0.653907

size	2	3	4	5
day
Fri	0.888889	0.055556	0.055556	0.000000
Sat	0.623529	0.211765	0.152941	0.011765
Sun	0.520000	0.200000	0.240000	0.040000
Thur	0.827586	0.068966	0.086207	0.017241

	pop	state	year
0	1.5	Ohio	2000
1	1.7	Ohio	2001
2	3.6	Ohio	2002
3	2.4	Nevada	2001
4	2.9	Nevada	2002

	year	state	pop	debt
one	2000	Ohio	1.5	16.5
two	2001	Ohio	1.7	16.5
three	2002	Ohio	3.6	16.5
four	2001	Nevada	2.4	16.5
five	2002	Nevada	2.9	16.5

	year	state	pop	debt	eastern
one	2000	Ohio	1.5	NaN	True
two	2001	Ohio	1.7	-1.2	True
three	2002	Ohio	3.6	NaN	True
four	2001	Nevada	2.4	-1.5	False
five	2002	Nevada	2.9	-1.7	False

	id	to_port	from_port	hs_code	descp	qty	unit	value	date
0	13992	DUBAI	INAMD4	4059090	MILK MAWA	500	KGS	88698.75	2013-02-01
1	30310	DUBAI	INAMD4	4059090	MILK MAWA	700	KGS	119600.25	2012-11-02
2	37135	BANGKOK	INAMD4	4039090	CREAM-35	2	NOS	42504.00	2012-10-03
3	47681	MAPUTO	INAMD4	4089900	EMU EGGS	18	PCS	21231.30	2013-01-05
4	53994	CAPE TOWN	INAMD4	4051000	PEANUT BUTTER SMOOTH HAVING NO. COMM. VALUE VA...	13	PAC	8898.89	2012-10-05

	id	to_port	from_port	hs_code	descp	qty	unit	value
date
2013-02-01	13992	DUBAI	INAMD4	4059090	MILK MAWA	500	KGS	88698.75
2012-11-02	30310	DUBAI	INAMD4	4059090	MILK MAWA	700	KGS	119600.25
2012-10-03	37135	BANGKOK	INAMD4	4039090	CREAM-35	2	NOS	42504.00
2013-01-05	47681	MAPUTO	INAMD4	4089900	EMU EGGS	18	PCS	21231.30
2012-10-05	53994	CAPE TOWN	INAMD4	4051000	PEANUT BUTTER SMOOTH HAVING NO. COMM. VALUE VA...	13	PAC	8898.89

	mean	median	std
date
2012-01-31	959229.921205	386905.455	1285043.331985
2012-02-29	1058314.492339	368042.130	1478173.074670
2012-03-31	1176705.192077	436196.825	1573295.375182
2012-04-30	1493411.445170	754509.750	1883248.945086
2012-05-31	1064001.441176	307954.995	1581324.286834

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

	total_bill	tip	sex	smoker	day	time	size
234	15.53	3.00	Male	Yes	Sat	Dinner	2
235	10.07	1.25	Male	No	Sat	Dinner	2
236	12.60	1.00	Male	Yes	Sat	Dinner	2
237	32.83	1.17	Male	Yes	Sat	Dinner	2
238	35.83	4.67	Female	No	Sat	Dinner	3
239	29.03	5.92	Male	No	Sat	Dinner	3
240	27.18	2.00	Female	Yes	Sat	Dinner	2
241	22.67	2.00	Male	Yes	Sat	Dinner	2
242	17.82	1.75	Male	No	Sat	Dinner	2
243	18.78	3.00	Female	No	Thur	Dinner	2

		tip_pct			total_bill
		count	mean	max	count	mean	max
sex	smoker
Female	No	54	0.156921	0.252672	54	18.105185	35.83
Female	Yes	33	0.182150	0.416667	33	17.977879	44.30
Male	No	97	0.160669	0.291990	97	19.791237	48.33
Male	Yes	60	0.152771	0.710345	60	22.284500	50.81

	total_bill	tip	sex	smoker	day	time	size	tip_pct
51	10.29	2.60	Female	No	Sun	Dinner	2	0.252672
221	13.42	3.48	Female	Yes	Fri	Lunch	2	0.259314
93	16.32	4.30	Female	Yes	Fri	Dinner	2	0.263480
149	7.51	2.00	Male	No	Thur	Lunch	2	0.266312
109	14.31	4.00	Female	Yes	Sat	Dinner	2	0.279525
183	23.17	6.50	Male	Yes	Sun	Dinner	4	0.280535
232	11.61	3.39	Male	No	Sat	Dinner	2	0.291990
67	3.07	1.00	Female	Yes	Sat	Dinner	1	0.325733
178	9.60	4.00	Female	Yes	Sun	Dinner	2	0.416667
172	7.25	5.15	Male	Yes	Sun	Dinner	2	0.710345

		total_bill	tip	sex	smoker	day	time	size	tip_pct
smoker
No	88	24.71	5.85	Male	No	Thur	Lunch	2	0.236746
	185	20.69	5.00	Male	No	Sun	Dinner	5	0.241663
	51	10.29	2.60	Female	No	Sun	Dinner	2	0.252672
	149	7.51	2.00	Male	No	Thur	Lunch	2	0.266312
	232	11.61	3.39	Male	No	Sat	Dinner	2	0.291990
Yes	109	14.31	4.00	Female	Yes	Sat	Dinner	2	0.279525
	183	23.17	6.50	Male	Yes	Sun	Dinner	4	0.280535
	67	3.07	1.00	Female	Yes	Sat	Dinner	1	0.325733
	178	9.60	4.00	Female	Yes	Sun	Dinner	2	0.416667
	172	7.25	5.15	Male	Yes	Sun	Dinner	2	0.710345

			total_bill	tip	sex	smoker	day	time	size	tip_pct
smoker	day
No	Fri	94	22.75	3.25	Female	No	Fri	Dinner	2	0.142857
	Sat	212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220
	Sun	156	48.17	5.00	Male	No	Sun	Dinner	6	0.103799
	Thur	142	41.19	5.00	Male	No	Thur	Lunch	5	0.121389
Yes	Fri	95	40.17	4.73	Male	Yes	Fri	Dinner	4	0.117750
	Sat	170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812
	Sun	182	45.35	3.50	Male	Yes	Sun	Dinner	3	0.077178
	Thur	197	43.11	5.00	Female	Yes	Thur	Lunch	4	0.115982

		size	tip	tip_pct	total_bill
sex	smoker
Female	No	2.592593	2.773519	0.156921	18.105185
Female	Yes	2.242424	2.931515	0.182150	17.977879
Male	No	2.711340	3.113402	0.160669	19.791237
Male	Yes	2.500000	3.051167	0.152771	22.284500